1 Universe

With the S&P 500 closing at a record high on last Friday (2 Jul 2021), this highlights that large cap companies are returning back to performing well in the market as the economy gradually recovers from COVID-19. The YTD performance of leading stock market indices such as the S&P 500 and NASDAQ 100 are up by ~3.45% and ~5.55% respectively - Reference.

Moreover, with large cap stocks often seen to provide higher quality and stability, they are said to be less volatile during rough markets. As such, it is worthwhile to look into creating a universe that consists of large cap stocks and relevant ETFs that invest in large cap companies. To ensure greater exposure and diversification, this universe will consider different sectors (Healthcare, Consumer Discretionary, Consumer Staples, Technology, Finance and Energy). Within each sector, the nature of business of the company is also considered.

Amid the COVID-19 pandemic, China emerged stronger than others. While the U.S. GDP fell by 2.3%, China’s grew by 2.3% in 2020 - Reference. The divergence between both countries in terms of GDP is reducing. It is thus projected that China will overtake the U.S. as the world’s largest economy in the near future. Hence, we will also seek to include large cap stocks from the China Market. Over the past 5 years, technology and consumer discretionary industries have been the best performing sectors. Hence, the composition of the universe will consist of more stocks coming from these sectors.

The team has decided to pick 15 assets, of which 10 are equity stocks and 5 are ETFs.

Components in the Team Universe:

Sector Number of Stocks/ETFs Stocks/ETFs Tickers
Healthcare 1 ETF XLV
Consumer Discretionary 4 Equity Stocks TSLA, NKE, HD, BABA
Consumer Staples 2 Equity Stocks PG, COST
Technology 4 Equity Stocks and 2 ETFs MSFT, AAPL, NFLX, ORCL, ARKW, KWEB
Finance 1 ETF XLF
Energy 1 ETF XLE

Here are some of the packages that we are going to use:

1.0.1 Loading of packages

library(tidyquant)
library(plotly)
library(quantmod)
library(timetk)
library(tidyverse)
library(dplyr)
library(fPortfolio)
library(PerformanceAnalytics)
library(ggplot2)
library(xts)
library(zoo)
library(PortfolioAnalytics)
library(viridis)
library(reshape)
library(hrbrthemes)

Import price data and obtain closing price and calculate returns

1.0.2 Adjusted Price and Return

# Get the price data
tickers <- c("XLV","TSLA","NKE","HD","BABA","PG","COST", "MSFT", "AAPL", "NFLX", "ORCL", "ARKW", "KWEB", "XLF", "XLE")
getSymbols(tickers, from="2015-01-01", to="2021-06-30", periodicity = "daily")
##  [1] "XLV"  "TSLA" "NKE"  "HD"   "BABA" "PG"   "COST" "MSFT" "AAPL" "NFLX"
## [11] "ORCL" "ARKW" "KWEB" "XLF"  "XLE"
#Make adjusted price data frame
get.AdPrices <- function(x) {Ad(get(x))}
AdClosePrices <- do.call(merge, lapply(tickers, get.AdPrices)) # all the ETF adjusted prices in one dataframe
head(AdClosePrices)
##            XLV.Adjusted TSLA.Adjusted NKE.Adjusted HD.Adjusted BABA.Adjusted
## 2015-01-02     61.60175        43.862     44.25800    89.22099        103.60
## 2015-01-05     61.28759        42.018     43.54545    87.34912        101.00
## 2015-01-06     61.08116        42.256     43.28929    87.08169        103.32
## 2015-01-07     62.51729        42.190     44.18349    90.06638        102.13
## 2015-01-08     63.58542        42.124     45.20343    92.05903        105.03
## 2015-01-09     63.01996        41.332     44.70510    90.48042        103.02
##            PG.Adjusted COST.Adjusted MSFT.Adjusted AAPL.Adjusted NFLX.Adjusted
## 2015-01-02    74.45441      120.1338      41.34821      24.81924      49.84857
## 2015-01-05    74.10041      118.7680      40.96798      24.12005      47.31143
## 2015-01-06    73.76288      120.3374      40.36669      24.12232      46.50143
## 2015-01-07    74.14981      122.4329      40.87954      24.46056      46.74286
## 2015-01-08    74.99775      123.4847      42.08214      25.40040      47.78000
## 2015-01-09    74.29797      121.5845      41.72844      25.42763      47.04143
##            ORCL.Adjusted ARKW.Adjusted KWEB.Adjusted XLF.Adjusted XLE.Adjusted
## 2015-01-02      39.96646      16.98015      31.30929     17.68642     60.85356
## 2015-01-05      39.40597      16.72110      31.40348     17.31453     58.33619
## 2015-01-06      38.99916      16.57403      31.56360     17.04992     57.47919
## 2015-01-07      39.00821      16.74116      31.87444     17.22871     57.60162
## 2015-01-08      39.24324      17.08795      32.40191     17.48618     58.89475
## 2015-01-09      39.22516      17.05536      32.11933     17.25017     58.42799
#Make return data frame
get.AdReturns <- function(x) {dailyReturn(Ad(get(x)))} #obtain the daily returns
AdCloseReturns <- do.call(merge, lapply(tickers, get.AdReturns))
colnames(AdCloseReturns) <- tickers
head(AdCloseReturns)
##                     XLV         TSLA          NKE           HD        BABA
## 2015-01-02  0.000000000  0.000000000  0.000000000  0.000000000  0.00000000
## 2015-01-05 -0.005099791 -0.042040901 -0.016099937 -0.020980174 -0.02509651
## 2015-01-06 -0.003368251  0.005664215 -0.005882521 -0.003061656  0.02297030
## 2015-01-07  0.023511833 -0.001561956  0.020656379  0.034274577 -0.01151764
## 2015-01-08  0.017085434 -0.001564304  0.023084165  0.022124272  0.02839520
## 2015-01-09 -0.008892997 -0.018801633 -0.011024141 -0.017147759 -0.01913741
##                      PG         COST         MSFT          AAPL        NFLX
## 2015-01-02  0.000000000  0.000000000  0.000000000  0.000000e+00  0.00000000
## 2015-01-05 -0.004754641 -0.011369488 -0.009195804 -2.817153e-02 -0.05089702
## 2015-01-06 -0.004555063  0.013214161 -0.014677025  9.431989e-05 -0.01712054
## 2015-01-07  0.005245633  0.017413334  0.012704807  1.402203e-02  0.00519184
## 2015-01-08  0.011435471  0.008591166  0.029418213  3.842242e-02  0.02218820
## 2015-01-09 -0.009330613 -0.015388320 -0.008405062  1.072306e-03 -0.01545774
##                     ORCL         ARKW         KWEB         XLF          XLE
## 2015-01-02  0.0000000000  0.000000000  0.000000000  0.00000000  0.000000000
## 2015-01-05 -0.0140239838 -0.015255813  0.003008276 -0.02102692 -0.041367768
## 2015-01-06 -0.0103235875 -0.008795591  0.005099021 -0.01528288 -0.014690608
## 2015-01-07  0.0002321076  0.010083666  0.009847893  0.01048621  0.002129971
## 2015-01-08  0.0060249876  0.020714874  0.016548495  0.01494424  0.022449594
## 2015-01-09 -0.0004606654 -0.001907133 -0.008721028 -0.01349695 -0.007925443
hist.return <- AdCloseReturns[rowSums(is.na(AdCloseReturns))==0, ]
hist.return.ts <- as.timeSeries(hist.return)

2 Feature Engineering

In this section, we will be looking into some features to determine if they are good predictors for the returns of each of the stocks in our universe.

2.0.1 Obtain relevant data

  • Directly from Yahoo Finance
# Get the price data of VIX, US Dollar Index from yahoo finance
get_yahoo <- function(tk) {
  
  df <- getSymbols(tk, src = 'yahoo', auto.assign = FALSE, from = '2015-01-01', to='2021-06-30')
  
  df <- df %>%
    as_tibble() %>%
    mutate(date = index(df))
  
  colnames(df) <- c("open", "high", "low", "close", "volume", "adjusted_close", "date", "ticker")
  
  return(df)
  
}

VIX <- get_yahoo('^VIX') %>%
  select(date, adjclose = adjusted_close)

Oil <- get_yahoo('BZ=F') %>%
  select(date, adjclose = adjusted_close)

US.dollar <- get_yahoo('DX-Y.NYB') %>%
  select(date, adjclose = adjusted_close)

gold <- get_yahoo('GC=F') %>%
  select(date, adjclose = adjusted_close)

sp500 <- get_yahoo('^GSPC') %>%
  select(date, adjclose = adjusted_close)

djia <- get_yahoo('^DJI') %>%
  select(date, adjclose = adjusted_close)

sp400 <- get_yahoo('^MID') %>%
  select(date, adjclose = adjusted_close)

nasdaq <- get_yahoo('^NDX') %>%
  select(date, adjclose = adjusted_close)

ixco <- get_yahoo('^IXCO') %>%
  select(date, adjclose = adjusted_close)

phlx_semiconductor <- get_yahoo('^SOX') %>%
  select(date, adjclose = adjusted_close)

indicators <- list(VIX, Oil, US.dollar,gold,sp500,djia,sp400,nasdaq,ixco,phlx_semiconductor) %>% reduce(left_join, by = "date")
colnames(indicators) <- c("date","vix","oil","us_dollar_index","gold", "sp500",
                          "djia","sp400","nasdaq","ixco","sox")
#check if there is any NA values
sum(is.na(indicators$us_dollar_index))
## [1] 12
sum(is.na(indicators$oil))
## [1] 17
sum(is.na(indicators$gold))
## [1] 13
# there is NA values, fill it with the previous day price
indicators <-indicators %>% fill(us_dollar_index, .direction = "down")
indicators <-indicators %>% fill(oil, .direction = "down")
indicators <-indicators %>% fill(gold, .direction = "down")

#Calculate returns
indicators <- indicators %>% tq_mutate(select = vix,
            mutate_fun = periodReturn,
            period = 'daily',
            type = 'arithmetic',                            
            col_rename = 'vix_return')

indicators <- indicators %>% tq_mutate(select = oil,
            mutate_fun = periodReturn,
            period = 'daily',
            type = 'arithmetic',                            
            col_rename = 'oil_return')

indicators <- indicators %>% tq_mutate(select = us_dollar_index,
            mutate_fun = periodReturn,
            period = 'daily',
            type = 'arithmetic',                            
            col_rename = 'us_dollar_index_return')

indicators <- indicators %>% tq_mutate(select = gold,
            mutate_fun = periodReturn,
            period = 'daily',
            type = 'arithmetic',                            
            col_rename = 'gold_return')

indicators <- indicators %>% tq_mutate(select = sp500,
            mutate_fun = periodReturn,
            period = 'daily',
            type = 'arithmetic',                            
            col_rename = 'sp500_return')

indicators <- indicators %>% tq_mutate(select = djia,
            mutate_fun = periodReturn,
            period = 'daily',
            type = 'arithmetic',                            
            col_rename = 'djia_return')

indicators <- indicators %>% tq_mutate(select = sp400,
            mutate_fun = periodReturn,
            period = 'daily',
            type = 'arithmetic',                            
            col_rename = 'sp400_return')

indicators <- indicators %>% tq_mutate(select = nasdaq,
            mutate_fun = periodReturn,
            period = 'daily',
            type = 'arithmetic',                            
            col_rename = 'nasdaq_return')

indicators <- indicators %>% tq_mutate(select = ixco,
            mutate_fun = periodReturn,
            period = 'daily',
            type = 'arithmetic',                            
            col_rename = 'ixco_return')

indicators <- indicators %>% tq_mutate(select = sox,
            mutate_fun = periodReturn,
            period = 'daily',
            type = 'arithmetic',                            
            col_rename = 'sox_return')
  • From downloaded csv files
filenames <- list.files(path="C:/Users/marcu/Desktop/NUS BBA4/Y4S2/SKKU ISS/ISS3244/project_data",
    pattern=".*csv")

names <- str_extract(filenames, "^[^\\.:]+")

for(i in names){
  assign(i, read.csv(paste("C:/Users/marcu/Desktop/NUS BBA4/Y4S2/SKKU ISS/ISS3244/project_data/", i, ".csv", sep="")))
}

head(DFF)
##         DATE  DFF
## 1 2015-01-01 0.06
## 2 2015-01-02 0.12
## 3 2015-01-03 0.12
## 4 2015-01-04 0.12
## 5 2015-01-05 0.12
## 6 2015-01-06 0.12
head(fsi)
##         Date OFR.FSI Credit Equity.valuation Safe.assets Funding Volatility
## 1 02/01/2015  -1.369 -0.244           -0.243       0.012  -0.373     -0.521
## 2 05/01/2015  -0.933 -0.192           -0.076       0.044  -0.360     -0.349
## 3 06/01/2015  -0.693 -0.151            0.002       0.100  -0.375     -0.269
## 4 07/01/2015  -0.817 -0.151           -0.040       0.075  -0.354     -0.347
## 5 08/01/2015  -1.168 -0.200           -0.165       0.057  -0.350     -0.510
## 6 09/01/2015  -1.137 -0.177           -0.127       0.090  -0.365     -0.558
##   United.States Other.advanced.economies Emerging.markets
## 1        -0.564                   -0.907            0.102
## 2        -0.388                   -0.712            0.167
## 3        -0.287                   -0.610            0.204
## 4        -0.352                   -0.652            0.187
## 5        -0.489                   -0.827            0.148
## 6        -0.485                   -0.803            0.151
head(ICSA)
##         DATE   ICSA  icsa_change
## 1 01/01/2015 281000             
## 2 01/02/2015 317000  0.128113879
## 3 01/03/2015 269000 -0.151419558
## 4 01/04/2015 269000            0
## 5 01/05/2015 275000  0.022304833
## 6 01/06/2015 275000            0
head(INDPRO)
##         DATE   INDPRO
## 1 2015-01-01 102.8479
## 2 2015-02-01 102.2292
## 3 2015-03-01 101.8945
## 4 2015-04-01 101.2859
## 5 2015-05-01 100.8408
## 6 2015-06-01 100.5063
head(sentiment)
##   ï..Reported.Date Bullish Neutral Bearish Bull.Bear.Spread Bullish.Average
## 1       2015-01-01  51.74%  28.96%  19.31%           32.40%          38.00%
## 2       2015-01-08  41.01%  31.30%  27.70%           13.30%          38.00%
## 3       2015-01-15  46.11%  32.40%  21.50%           24.60%          38.00%
## 4       2015-01-22  37.14%  32.06%  30.79%            6.30%          38.00%
## 5       2015-01-29  44.17%  33.44%  22.39%           21.80%          38.00%
## 6       2015-02-05  35.49%  32.08%  32.42%            3.10%          38.00%
head(T10Y3M)
##         DATE T10Y3M
## 1 2015-01-02   2.10
## 2 2015-01-05   2.01
## 3 2015-01-06   1.94
## 4 2015-01-07   1.93
## 5 2015-01-08   2.00
## 6 2015-01-09   1.96
head(cpi)
##         DATE CPILFESL_PCH
## 1 2015-01-01      0.09475
## 2 2015-02-01      0.15054
## 3 2015-03-01      0.24274
## 4 2015-04-01      0.24548
## 5 2015-05-01      0.14171
## 6 2015-06-01      0.15557
DFF$DATE <- as.Date(DFF$DATE)
colnames(DFF) <- c("date","fed_funds_rate")

fsi$Date <- as.Date(fsi$Date, format="%d/%m/%Y")
fsi_df <- fsi[,1:2]
colnames(fsi_df) <- c("date","osr_fsi")

T10Y3M$DATE <- as.Date(T10Y3M$DATE, format="%Y-%m-%d")
colnames(T10Y3M) <- c("date","term_spread")

cpi$DATE <- as.Date(cpi$DATE, format="%Y-%m-%d")
colnames(cpi) <- c("date","change_in_cpi")

# join those with daily data to the previous 'indicators' dataframe
indicators <- indicators %>% left_join(DFF, by="date") %>% left_join(fsi_df, by="date") %>% 
  left_join(T10Y3M, by="date")

head(indicators) 
## # A tibble: 6 x 24
##   date         vix   oil us_dollar_index  gold sp500   djia sp400 nasdaq  ixco
##   <date>     <dbl> <dbl>           <dbl> <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl>
## 1 2015-01-02  17.8  56.4            91.1 1186  2058. 17833. 1451.  4230. 2446.
## 2 2015-01-05  19.9  53.1            91.4 1204. 2021. 17502. 1428.  4161. 2403.
## 3 2015-01-06  21.1  51.1            91.5 1219. 2003. 17372. 1413.  4111. 2369.
## 4 2015-01-07  19.3  51.2            91.9 1211. 2026. 17585. 1431.  4160  2392.
## 5 2015-01-08  17.0  51.0            92.4 1208. 2062. 17908. 1453.  4241. 2453.
## 6 2015-01-09  17.5  50.1            91.9 1216  2045. 17737. 1441.  4213. 2444.
## # ... with 14 more variables: sox <dbl>, vix_return <dbl>, oil_return <dbl>,
## #   us_dollar_index_return <dbl>, gold_return <dbl>, sp500_return <dbl>,
## #   djia_return <dbl>, sp400_return <dbl>, nasdaq_return <dbl>,
## #   ixco_return <dbl>, sox_return <dbl>, fed_funds_rate <dbl>, osr_fsi <dbl>,
## #   term_spread <dbl>
  • Individual Stock Technical Indicators - Bollinger Bands Width, RSI, MACD, PE Ratio
# -------------------------------------------------------------- Bollinger Bands ----------------------------------------------------------

getSymbols(tickers, from="2014-12-01", to="2021-06-30", periodicity = "daily")
##  [1] "XLV"  "TSLA" "NKE"  "HD"   "BABA" "PG"   "COST" "MSFT" "AAPL" "NFLX"
## [11] "ORCL" "ARKW" "KWEB" "XLF"  "XLE"
date_band <- "2015-01-01"

# Repeat this code for all the tickers
xlv_bbands <- BBands(XLV[,c("XLV.High","XLV.Low","XLV.Close")])
xlv_bbands <- xlv_bbands[index(xlv_bbands) >= date_band,1:3]
XLV <- XLV[index(XLV) >= date_band,]
XLV <- cbind(XLV, xlv_bbands)
bb_df_xlv <- as.data.frame(XLV) %>% mutate(xlv_bb_width = (up-dn)/mavg) %>% select(xlv_bb_width)

# ----repeat----
tsla_bbands <- BBands(TSLA[,c("TSLA.High","TSLA.Low","TSLA.Close")])
tsla_bbands <- tsla_bbands[index(tsla_bbands) >= date_band,1:3]
TSLA <- TSLA[index(TSLA) >= date_band,]
TSLA <- cbind(TSLA, tsla_bbands)
bb_df_tsla <- as.data.frame(TSLA) %>% mutate(tsla_bb_width = (up-dn)/mavg) %>% select(tsla_bb_width)

nke_bbands <- BBands(NKE[,c("NKE.High","NKE.Low","NKE.Close")])
nke_bbands <- nke_bbands[index(nke_bbands) >= date_band,1:3]
NKE <- NKE[index(NKE) >= date_band,]
NKE <- cbind(NKE, nke_bbands)
bb_df_nke <- as.data.frame(NKE) %>% mutate(nke_bb_width = (up-dn)/mavg) %>% select(nke_bb_width)

hd_bbands <- BBands(HD[,c("HD.High","HD.Low","HD.Close")])
hd_bbands <- hd_bbands[index(hd_bbands) >= date_band,1:3]
HD <- HD[index(HD) >= date_band,]
HD <- cbind(HD, hd_bbands)
bb_df_hd <- as.data.frame(HD) %>% mutate(hd_bb_width = (up-dn)/mavg) %>% select(hd_bb_width)

baba_bbands <- BBands(BABA[,c("BABA.High","BABA.Low","BABA.Close")])
baba_bbands <- baba_bbands[index(baba_bbands) >= date_band,1:3]
BABA <- BABA[index(BABA) >= date_band,]
BABA <- cbind(BABA, baba_bbands)
bb_df_baba <- as.data.frame(BABA) %>% mutate(baba_bb_width = (up-dn)/mavg) %>% select(baba_bb_width)

pg_bbands <- BBands(PG[,c("PG.High","PG.Low","PG.Close")])
pg_bbands <- pg_bbands[index(pg_bbands) >= date_band,1:3]
PG <- PG[index(PG) >= date_band,]
PG <- cbind(PG, pg_bbands)
bb_df_pg <- as.data.frame(PG) %>% mutate(pg_bb_width = (up-dn)/mavg) %>% select(pg_bb_width)

cost_bbands <- BBands(COST[,c("COST.High","COST.Low","COST.Close")])
cost_bbands <- cost_bbands[index(cost_bbands) >= date_band,1:3]
COST <- COST[index(COST) >= date_band,]
COST <- cbind(COST, cost_bbands)
bb_df_cost <- as.data.frame(COST) %>% mutate(cost_bb_width = (up-dn)/mavg) %>% select(cost_bb_width)

msft_bbands <- BBands(MSFT[,c("MSFT.High","MSFT.Low","MSFT.Close")])
msft_bbands <- msft_bbands[index(msft_bbands) >= date_band,1:3]
MSFT <- MSFT[index(MSFT) >= date_band,]
MSFT <- cbind(MSFT, msft_bbands)
bb_df_msft <- as.data.frame(MSFT) %>% mutate(msft_bb_width = (up-dn)/mavg) %>% select(msft_bb_width)

aapl_bbands <- BBands(AAPL[,c("AAPL.High","AAPL.Low","AAPL.Close")])
date_band <- "2015-01-01"
aapl_bbands <- aapl_bbands[index(aapl_bbands) >= date_band,1:3]
AAPL <- AAPL[index(AAPL) >= date_band,]
AAPL <- cbind(AAPL, aapl_bbands)
bb_df_aapl <- as.data.frame(AAPL) %>% mutate(aapl_bb_width = (up-dn)/mavg) %>% select(aapl_bb_width)

nflx_bbands <- BBands(NFLX[,c("NFLX.High","NFLX.Low","NFLX.Close")])
nflx_bbands <- nflx_bbands[index(nflx_bbands) >= date_band,1:3]
NFLX <- NFLX[index(NFLX) >= date_band,]
NFLX <- cbind(NFLX, nflx_bbands)
bb_df_nflx <- as.data.frame(NFLX) %>% mutate(nflx_bb_width = (up-dn)/mavg) %>% select(nflx_bb_width)

orcl_bbands <- BBands(ORCL[,c("ORCL.High","ORCL.Low","ORCL.Close")])
orcl_bbands <- orcl_bbands[index(orcl_bbands) >= date_band,1:3]
ORCL <- ORCL[index(ORCL) >= date_band,]
ORCL <- cbind(ORCL, orcl_bbands)
bb_df_orcl <- as.data.frame(ORCL) %>% mutate(orcl_bb_width = (up-dn)/mavg) %>% select(orcl_bb_width)

arkw_bbands <- BBands(ARKW[,c("ARKW.High","ARKW.Low","ARKW.Close")])
arkw_bbands <- arkw_bbands[index(arkw_bbands) >= date_band,1:3]
ARKW <- ARKW[index(ARKW) >= date_band,]
ARKW <- cbind(ARKW, arkw_bbands)
bb_df_arkw <- as.data.frame(ARKW) %>% mutate(arkw_bb_width = (up-dn)/mavg) %>% select(arkw_bb_width)

kweb_bbands <- BBands(KWEB[,c("KWEB.High","KWEB.Low","KWEB.Close")])
kweb_bbands <- kweb_bbands[index(kweb_bbands) >= date_band,1:3]
KWEB <- KWEB[index(KWEB) >= date_band,]
KWEB <- cbind(KWEB, kweb_bbands)
bb_df_kweb <- as.data.frame(KWEB) %>% mutate(kweb_bb_width = (up-dn)/mavg) %>% select(kweb_bb_width)

xlf_bbands <- BBands(XLF[,c("XLF.High","XLF.Low","XLF.Close")])
xlf_bbands <- xlf_bbands[index(xlf_bbands) >= date_band,1:3]
XLF <- XLF[index(XLF) >= date_band,]
XLF <- cbind(XLF, xlf_bbands)
bb_df_xlf <- as.data.frame(XLF) %>% mutate(xlf_bb_width = (up-dn)/mavg) %>% select(xlf_bb_width)

xle_bbands <- BBands(XLE[,c("XLE.High","XLE.Low","XLE.Close")])
xle_bbands <- xle_bbands[index(xle_bbands) >= date_band,1:3]
XLE <- XLE[index(XLE) >= date_band,]
XLE <- cbind(XLE, xle_bbands)
bb_df_xle <- as.data.frame(XLE) %>% mutate(xle_bb_width = (up-dn)/mavg) %>% select(xle_bb_width)

bb_width_full <- cbind(bb_df_xlv,bb_df_tsla,bb_df_nke,bb_df_hd,bb_df_baba,bb_df_pg,bb_df_cost,bb_df_msft,bb_df_aapl,bb_df_nflx,bb_df_orcl,
                       bb_df_arkw,bb_df_kweb,bb_df_xlf,bb_df_xle)

head(bb_width_full)
##            xlv_bb_width tsla_bb_width nke_bb_width hd_bb_width baba_bb_width
## 2015-01-02   0.05046288     0.1529965   0.05647051  0.09011647    0.06810290
## 2015-01-05   0.05078036     0.1459900   0.05744874  0.08638457    0.07631263
## 2015-01-06   0.04970755     0.1429338   0.05732655  0.08406963    0.07935285
## 2015-01-07   0.04348246     0.1434461   0.05541990  0.08472429    0.08300509
## 2015-01-08   0.04458586     0.1439687   0.05511696  0.08954571    0.08384930
## 2015-01-09   0.04607035     0.1468406   0.05292499  0.08994705    0.08524019
##            pg_bb_width cost_bb_width msft_bb_width aapl_bb_width nflx_bb_width
## 2015-01-02  0.05552498    0.05351719    0.07084799    0.07453298    0.08496462
## 2015-01-05  0.05590942    0.05357144    0.06835312    0.07800818    0.07681345
## 2015-01-06  0.05637303    0.05308909    0.06760763    0.08315040    0.07392751
## 2015-01-07  0.05744363    0.05437843    0.06924066    0.08641987    0.07586328
## 2015-01-08  0.05706415    0.05914087    0.06892389    0.08526546    0.07617654
## 2015-01-09  0.05683508    0.06036266    0.06906068    0.08414230    0.07684765
##            orcl_bb_width arkw_bb_width kweb_bb_width xlf_bb_width xle_bb_width
## 2015-01-02     0.2045260    0.08349406    0.07482921   0.05511715    0.1163384
## 2015-01-05     0.2014545    0.08385767    0.06104168   0.05633616    0.1123012
## 2015-01-06     0.1980629    0.08517355    0.04761965   0.06128879    0.1120963
## 2015-01-07     0.1933610    0.08554814    0.05059501   0.06332167    0.1151997
## 2015-01-08     0.1878569    0.08450879    0.05820247   0.06309029    0.1159255
## 2015-01-09     0.1796319    0.08379646    0.06398762   0.06391163    0.1146727
#  ------------------------------------------------- Relative Strength Index (RSI) ------------------------------------------------- 

# calculations
getSymbols(tickers, from="2014-12-11", to="2021-06-30", periodicity = "daily")
##  [1] "XLV"  "TSLA" "NKE"  "HD"   "BABA" "PG"   "COST" "MSFT" "AAPL" "NFLX"
## [11] "ORCL" "ARKW" "KWEB" "XLF"  "XLE"
rsi_df_xlv <- RSI(XLV$XLV.Adjusted,14)
rsi_df_tsla <- RSI(TSLA$TSLA.Adjusted,14)
rsi_df_nke <- RSI(NKE$NKE.Adjusted,14)
rsi_df_hd <- RSI(HD$HD.Adjusted,14)
rsi_df_baba <- RSI(BABA$BABA.Adjusted,14)
rsi_df_pg <- RSI(PG$PG.Adjusted,14)
rsi_df_cost <- RSI(COST$COST.Adjusted,14)
rsi_df_msft <- RSI(MSFT$MSFT.Adjusted,14)
rsi_df_aapl <- RSI(AAPL$AAPL.Adjusted,14)
rsi_df_nflx <- RSI(NFLX$NFLX.Adjusted,14)
rsi_df_orcl <- RSI(ORCL$ORCL.Adjusted,14)
rsi_df_arkw <- RSI(ARKW$ARKW.Adjusted,14)
rsi_df_kweb <- RSI(KWEB$KWEB.Adjusted,14)
rsi_df_xlf <- RSI(XLF$XLF.Adjusted,14)
rsi_df_xle <- RSI(XLE$XLE.Adjusted,14)

# Subset data
rsi_df_xlv <- rsi_df_xlv[index(rsi_df_xlv) >= date_band,]
rsi_df_tsla <- rsi_df_tsla[index(rsi_df_tsla) >= date_band,]
rsi_df_nke <- rsi_df_nke[index(rsi_df_nke) >= date_band,]
rsi_df_hd <- rsi_df_hd[index(rsi_df_hd) >= date_band,]
rsi_df_baba <- rsi_df_baba[index(rsi_df_baba) >= date_band,]
rsi_df_pg <- rsi_df_pg[index(rsi_df_pg) >= date_band,]
rsi_df_cost <- rsi_df_cost[index(rsi_df_cost) >= date_band,]
rsi_df_msft <- rsi_df_msft[index(rsi_df_msft) >= date_band,]
rsi_df_aapl <- rsi_df_aapl[index(rsi_df_aapl) >= date_band,]
rsi_df_nflx <- rsi_df_nflx[index(rsi_df_nflx) >= date_band,]
rsi_df_orcl <- rsi_df_orcl[index(rsi_df_orcl) >= date_band,]
rsi_df_arkw <- rsi_df_arkw[index(rsi_df_arkw) >= date_band,]
rsi_df_kweb <- rsi_df_kweb[index(rsi_df_kweb) >= date_band,]
rsi_df_xlf <- rsi_df_xlf[index(rsi_df_xlf) >= date_band,]
rsi_df_xle <- rsi_df_xle[index(rsi_df_xle) >= date_band,]


# Colnames
colnames(rsi_df_xlv) <- "rsi_xlv"
colnames(rsi_df_tsla) <- "rsi_tsla"
colnames(rsi_df_nke) <- "rsi_nke"
colnames(rsi_df_hd) <- "rsi_hd"
colnames(rsi_df_baba) <- "rsi_baba"
colnames(rsi_df_pg) <- "rsi_pg"
colnames(rsi_df_cost) <- "rsi_cost"
colnames(rsi_df_msft) <- "rsi_msft"
colnames(rsi_df_aapl) <- "rsi_aapl"
colnames(rsi_df_nflx) <- "rsi_nflx"
colnames(rsi_df_orcl) <- "rsi_orcl"
colnames(rsi_df_arkw) <- "rsi_arkw"
colnames(rsi_df_kweb) <- "rsi_kweb"
colnames(rsi_df_xlf) <- "rsi_clf"
colnames(rsi_df_xle) <- "rsi_xle"

# Join all
rsi_full <- cbind(rsi_df_xlv,rsi_df_tsla,rsi_df_nke,rsi_df_hd,rsi_df_baba,rsi_df_pg,rsi_df_cost,rsi_df_msft,rsi_df_aapl,rsi_df_nflx,
                       rsi_df_orcl,rsi_df_arkw,rsi_df_kweb,rsi_df_xlf,rsi_df_xle)

head(rsi_full)
##             rsi_xlv rsi_tsla  rsi_nke   rsi_hd rsi_baba   rsi_pg rsi_cost
## 2015-01-02 45.42549 59.80079 43.47522 61.18974 45.26607 50.17064 50.69455
## 2015-01-05 43.85413 50.39656 39.05556 52.50073 37.92705 47.65977 45.24644
## 2015-01-06 42.80628 51.45762 37.57665 51.37821 46.29434 45.33031 51.67317
## 2015-01-07 51.49027 51.13097 45.35553 61.31853 43.08612 48.44138 58.65170
## 2015-01-08 56.75012 50.78383 52.60964 66.27566 51.84439 54.54587 61.64563
## 2015-01-09 53.44623 46.68741 49.17479 59.74366 46.50289 49.35264 54.03380
##            rsi_msft rsi_aapl rsi_nflx rsi_orcl rsi_arkw rsi_kweb  rsi_clf
## 2015-01-02 47.30612 44.37901 60.93870 66.76053 54.25111 49.37613 54.22045
## 2015-01-05 44.59264 38.16454 47.15141 62.81191 48.91142 50.32958 45.14509
## 2015-01-06 40.62424 38.19486 43.74835 60.03636 46.13505 51.98525 40.01307
## 2015-01-07 45.11098 42.69482 45.02201 60.07863 49.63375 55.11311 44.59585
## 2015-01-08 53.90677 52.94623 50.23489 61.22542 56.01764 59.88839 50.46476
## 2015-01-09 51.30283 53.20748 46.82964 61.08008 55.30819 56.42522 45.68728
##             rsi_xle
## 2015-01-02 66.32957
## 2015-01-05 52.94117
## 2015-01-06 49.29342
## 2015-01-07 49.82530
## 2015-01-08 55.17365
## 2015-01-09 52.97844
#  ------------------------------------------------- Moving Average Convergence Divergence (MACD) ----------------------------------------

getSymbols(tickers, from="2014-11-13", to="2021-06-30", periodicity = "daily")
##  [1] "XLV"  "TSLA" "NKE"  "HD"   "BABA" "PG"   "COST" "MSFT" "AAPL" "NFLX"
## [11] "ORCL" "ARKW" "KWEB" "XLF"  "XLE"
# Calculation
myMACD <- function (x,price,S,L,K){
  MACD <- EMA(price,S) - EMA(price,L)
  signal <- EMA(MACD,K)
  date <- x[,1]
  price <- price
  output <- cbind(date,price, MACD,signal)
  colnames(output) <- c("date","closing_price", "MACD","signal")
  return(output)
}

XLV <- myMACD(XLV,XLV$XLV.Adjusted, 12, 26,9)
TSLA <- myMACD(TSLA,TSLA$TSLA.Adjusted, 12, 26,9)
NKE <- myMACD(NKE,NKE$NKE.Adjusted, 12, 26,9)
HD <- myMACD(HD,HD$HD.Adjusted, 12, 26,9)
BABA <- myMACD(BABA,BABA$BABA.Adjusted, 12, 26,9)
PG <- myMACD(PG,PG$PG.Adjusted, 12, 26,9)
COST <- myMACD(COST,COST$COST.Adjusted, 12, 26,9)
MSFT <- myMACD(MSFT,MSFT$MSFT.Adjusted, 12, 26,9)
AAPL <- myMACD(AAPL,AAPL$AAPL.Adjusted, 12, 26,9)
NFLX <- myMACD(NFLX,NFLX$NFLX.Adjusted, 12, 26,9)
ORCL <- myMACD(ORCL,ORCL$ORCL.Adjusted, 12, 26,9)
ARKW <- myMACD(ARKW,ARKW$ARKW.Adjusted, 12, 26,9)
KWEB <- myMACD(KWEB,KWEB$KWEB.Adjusted, 12, 26,9)
XLF <- myMACD(XLF,XLF$XLF.Adjusted, 12, 26,9)
XLE <- myMACD(XLE,XLE$XLE.Adjusted, 12, 26,9)

# Subset data
XLV <- XLV[index(XLV) >= date_band,]
TSLA <- TSLA[index(TSLA) >= date_band,]
NKE <- NKE[index(NKE) >= date_band,]
HD <- HD[index(HD) >= date_band,]
BABA <- BABA[index(BABA) >= date_band,]
PG <- PG[index(PG) >= date_band,]
COST <- COST[index(COST) >= date_band,]
MSFT <- MSFT[index(MSFT) >= date_band,]
AAPL <- AAPL[index(AAPL) >= date_band,]
NFLX <- NFLX[index(NFLX) >= date_band,]
ORCL <- ORCL[index(ORCL) >= date_band,]
ARKW <- ARKW[index(ARKW) >= date_band,]
KWEB <- KWEB[index(KWEB) >= date_band,]
XLF <- XLF[index(XLF) >= date_band,]
XLE <- XLE[index(XLE) >= date_band,]

# Compute MACD - Signal Line
xlv_macd_signal_dist <- XLV$MACD-XLV$signal
tsla_macd_signal_dist <- TSLA$MACD-TSLA$signal
nke_macd_signal_dist <- NKE$MACD-NKE$signal
hd_macd_signal_dist <- HD$MACD-HD$signal
baba_macd_signal_dist <- BABA$MACD-BABA$signal
pg_macd_signal_dist <- PG$MACD-PG$signal
cost_macd_signal_dist <- COST$MACD-COST$signal
msft_macd_signal_dist <- MSFT$MACD-MSFT$signal
aapl_macd_signal_dist <- AAPL$MACD-AAPL$signal
nflx_macd_signal_dist <- NFLX$MACD-NFLX$signal
orcl_macd_signal_dist <- ORCL$MACD-ORCL$signal
arkw_macd_signal_dist <- ARKW$MACD-ARKW$signal
kweb_macd_signal_dist <- KWEB$MACD-KWEB$signal
xlf_macd_signal_dist <- XLF$MACD-XLF$signal
xle_macd_signal_dist <- XLE$MACD-XLE$signal

# rename columns
colnames(xlv_macd_signal_dist) <- "macd_xlv"
colnames(tsla_macd_signal_dist) <- "macd_tsla"
colnames(nke_macd_signal_dist) <- "macd_nke"
colnames(hd_macd_signal_dist) <- "macd_hd"
colnames(baba_macd_signal_dist) <- "macd_baba"
colnames(pg_macd_signal_dist) <- "macd_pg"
colnames(cost_macd_signal_dist) <- "macd_cost"
colnames(msft_macd_signal_dist) <- "macd_msft"
colnames(aapl_macd_signal_dist) <- "macd_aapl"
colnames(nflx_macd_signal_dist) <- "macd_nflx"
colnames(orcl_macd_signal_dist) <- "macd_orcl"
colnames(arkw_macd_signal_dist) <- "macd_arkw"
colnames(kweb_macd_signal_dist) <- "macd_kweb"
colnames(xlf_macd_signal_dist) <- "macd_clf"
colnames(xle_macd_signal_dist) <- "macd_xle"

# Join all
macd_full <- cbind(xlv_macd_signal_dist,tsla_macd_signal_dist,nke_macd_signal_dist,hd_macd_signal_dist,baba_macd_signal_dist,
                   pg_macd_signal_dist,cost_macd_signal_dist,msft_macd_signal_dist,aapl_macd_signal_dist,nflx_macd_signal_dist,
                   orcl_macd_signal_dist,arkw_macd_signal_dist,kweb_macd_signal_dist,xlf_macd_signal_dist,xle_macd_signal_dist)

head(macd_full)
##               macd_xlv macd_tsla    macd_nke     macd_hd   macd_baba    macd_pg
## 2015-01-02 -0.11591153 0.6945530  0.05833138  0.18619906 -0.21491880 -0.1947895
## 2015-01-05 -0.13248686 0.4911598 -0.02259432 -0.03252131 -0.39040848 -0.2804253
## 2015-01-06 -0.15020551 0.3695569 -0.08632948 -0.19743196 -0.31785197 -0.3480437
## 2015-01-07 -0.06254512 0.2853501 -0.06233897 -0.11197807 -0.31690419 -0.3537977
## 2015-01-08  0.06322748 0.2283467  0.02354512  0.06286569 -0.09915219 -0.2902478
## 2015-01-09  0.10189823 0.1445298  0.04677473  0.05651900 -0.07004544 -0.2847010
##              macd_cost   macd_msft   macd_aapl macd_nflx   macd_orcl
## 2015-01-02  0.05060113  0.01569016  0.05808156 0.5342709 -0.01401458
## 2015-01-05 -0.10687506 -0.01917422 -0.01524041 0.3782535 -0.12251508
## 2015-01-06 -0.10361582 -0.07510853 -0.05694365 0.2196776 -0.21945489
## 2015-01-07  0.03402624 -0.07004397 -0.05507218 0.1344098 -0.27702736
## 2015-01-08  0.18200160  0.01716602  0.01253055 0.1500341 -0.29185468
## 2015-01-09  0.13988796  0.05193254  0.05952015 0.1144129 -0.29460009
##               macd_arkw macd_kweb    macd_clf   macd_xle
## 2015-01-02  0.053654744 0.2225905  0.01557959 0.59704233
## 2015-01-05  0.016897021 0.2401108 -0.01933961 0.38068023
## 2015-01-06 -0.016224385 0.2586023 -0.05814199 0.18274334
## 2015-01-07 -0.025241913 0.2857616 -0.06891458 0.06881917
## 2015-01-08 -0.007155617 0.3301041 -0.05637563 0.08807937
## 2015-01-09  0.002775883 0.3304442 -0.06124702 0.07729385

2.0.2 Correlation (Information Coefficient)

  • In this sub section, we are going to find the correlation between the features that we initially believe might be able to use to predict the stock returns. We have divided the features into the time period of the data available.
  • Some of the features have daily data, while some is weekly and some is monthly. At the end, we will have all the relevant correlation coefficient of all the features with the returns of the stocks in our universe.
  • We will use the quantmod package to do the respective returns calculation for different time period.
    • Daily Data: Already did in the Section 1.0.2
    • Weekly Data: weeklyReturn
    • Monthly Data: monthlyReturn

Features with daily data

# Check correlation of features with individual stock - if there is relationship, then include in the model.
# if it can predict the stock return

relationship <- cor(AdCloseReturns[-1], indicators$vix_return[-dim(indicators)[1]]) %>% 
  cbind(cor(AdCloseReturns[-1], indicators$oil_return[-dim(indicators)[1]])) %>% 
  cbind(cor(AdCloseReturns[-1], indicators$us_dollar_index_return[-dim(indicators)[1]])) %>% 
  cbind(cor(AdCloseReturns[-1], indicators$gold_return[-dim(indicators)[1]])) %>% 
  cbind(cor(AdCloseReturns[-1], indicators$sp500_return[-dim(indicators)[1]])) %>% 
  cbind(cor(AdCloseReturns[-1], indicators$djia_return[-dim(indicators)[1]])) %>% 
  cbind(cor(AdCloseReturns[-1], indicators$sp400_return[-dim(indicators)[1]])) %>% 
  cbind(cor(AdCloseReturns[-1], indicators$nasdaq_return[-dim(indicators)[1]])) %>% 
  cbind(cor(AdCloseReturns[-1], indicators$ixco_return[-dim(indicators)[1]])) %>% 
  cbind(cor(AdCloseReturns[-1], indicators$sox_return[-dim(indicators)[1]])) %>% 
  cbind(cor(AdCloseReturns[-1], indicators$fed_funds_rate[-dim(indicators)[1]])) %>% 
  cbind(cor(AdCloseReturns[-1], indicators$osr_fsi[-dim(indicators)[1]])) %>%
  cbind(cor(AdCloseReturns[-1], indicators$term_spread[-dim(indicators)[1]], use = "complete.obs"))

colnames(relationship) <- c("VIX", "Oil", "US Dollar Index", "Gold", "SP500", 
                            "DJIA", "SP400 Mid Cap", "NASDAQ", "IXCO", "SOX", "Fed Funds Rates", 
                            "OSR FSI", "Term Spread")


# BB Width, RSI, MACD
options(scipen = 999)

new_close <- AdCloseReturns[-1] # remove first row
new_bb_width_full <- bb_width_full[-nrow(bb_width_full),] # remove last row
bb_width_matrix <- as.matrix(diag(cor(new_close,new_bb_width_full)))

new_rsi_full <- rsi_full[-nrow(rsi_full),] # remove last row
rsi_matrix<- as.matrix(diag(cor(new_close,new_rsi_full)))

new_macd_full <- macd_full[-nrow(macd_full),] # remove last row
macd_matrix <- as.matrix(diag(cor(new_close,new_macd_full)))


df_2 <- cbind(bb_width_matrix,rsi_matrix,macd_matrix)
rownames(df_2) <- tickers
colnames(df_2) <- c("Bollinger Band Width", "RSI", "MACD")

relationship <- cbind(relationship,df_2)

Feature with weekly data i.e. AAII Sentiment

# Get the price data
tickers <- c("XLV","TSLA","NKE","HD","BABA","PG","COST", "MSFT", "AAPL", "NFLX", "ORCL", "ARKW", "KWEB", "XLF", "XLE")
getSymbols(tickers, from="2015-01-01", to="2021-06-30", periodicity = "weekly")
##  [1] "XLV"  "TSLA" "NKE"  "HD"   "BABA" "PG"   "COST" "MSFT" "AAPL" "NFLX"
## [11] "ORCL" "ARKW" "KWEB" "XLF"  "XLE"
#Make return data frame
weekly_get.AdReturns <- function(x) {weeklyReturn(Ad(get(x)))} #obtain the weekly returns
weekly_AdCloseReturns <- do.call(merge, lapply(tickers, weekly_get.AdReturns))
colnames(weekly_AdCloseReturns) <- tickers
head(weekly_AdCloseReturns)
##                     XLV        TSLA          NKE           HD        BABA
## 2015-01-01  0.000000000  0.00000000  0.000000000  0.000000000  0.00000000
## 2015-01-08  0.002440733 -0.08656082 -0.012227214 -0.016952731 -0.02496813
## 2015-01-15  0.009882994  0.02013600 -0.002134122  0.008671395  0.03725647
## 2015-01-22 -0.010353576  0.01424434 -0.002139122  0.011687403 -0.04685840
## 2015-01-29 -0.007595111  0.09620299 -0.008251775  0.035038912 -0.08583034
## 2015-02-05  0.013573916 -0.02630972 -0.013291575  0.017526170 -0.04444444
##                       PG         COST          MSFT         AAPL         NFLX
## 2015-01-01  0.0000000000  0.000000000  0.0000000000  0.000000000  0.000000000
## 2015-01-08 -0.0008882693 -0.034714963 -0.0058401344  0.019025686 -0.009046494
## 2015-01-15  0.0082228881 -0.001220171 -0.0008704062 -0.002277105  0.262274849
## 2015-01-22 -0.0548261667  0.003449750 -0.1030051577  0.052578846  0.081069212
## 2015-01-29  0.0075165736  0.116745601  0.0157806410  0.036857173  0.014125632
## 2015-02-05 -0.0017484798 -0.054258527  0.0129065561  0.044496527  0.013772703
##                     ORCL         ARKW          KWEB          XLF          XLE
## 2015-01-01  0.0000000000  0.000000000  0.0000000000  0.000000000  0.000000000
## 2015-01-08  0.0006229658 -0.005191038 -0.0280732470 -0.024491392 -0.019659916
## 2015-01-15  0.0099858573  0.025589496  0.0288841180 -0.002127538  0.041327626
## 2015-01-22 -0.0183951799  0.000489332  0.0005911633 -0.010660966 -0.028626779
## 2015-01-29 -0.0063243616  0.025916761  0.0026579265  0.021982800  0.053851275
## 2015-02-05  0.0221591674  0.022878821 -0.0238586453  0.021088177  0.008008035
# sentiment data (extract the bullish sentiment column)
bullish_sentiment <- sentiment[,1:2]
bullish_sentiment$ï..Reported.Date <- as.Date(bullish_sentiment$ï..Reported.Date)
bullish_sentiment <- bullish_sentiment[bullish_sentiment$ï..Reported.Date<="2021-06-24",] # to get the same dates with the assets returns dataframe
bullish_sentiment$Bullish <- as.numeric(gsub("%", "", bullish_sentiment$Bullish))
colnames(bullish_sentiment) <- c("date", "bullish_sentiment")

# "2021-01-14" - missing in bullish sentiment data, so will remove this date from the adclosereturn in order to have same no. of rows for calculating dimension
weekly_AdCloseReturns <- weekly_AdCloseReturns[-c(316),]

corr <- cor(weekly_AdCloseReturns[-1], bullish_sentiment$bullish_sentiment[-dim(bullish_sentiment)[1]])
colnames(corr) <- "Bullish Sentiment"

# add the column to the relationship table
relationship <- relationship %>% cbind(corr)

Feature with monthly data i.e. unemployment claims, industrial production index and change in CPI

# Get the price data
tickers <- c("XLV","TSLA","NKE","HD","BABA","PG","COST", "MSFT", "AAPL", "NFLX", "ORCL", "ARKW", "KWEB", "XLF", "XLE")
getSymbols(tickers, from="2015-01-01", to="2021-06-30", periodicity = "monthly")
##  [1] "XLV"  "TSLA" "NKE"  "HD"   "BABA" "PG"   "COST" "MSFT" "AAPL" "NFLX"
## [11] "ORCL" "ARKW" "KWEB" "XLF"  "XLE"
#Make return data frame
monthly_get.AdReturns <- function(x) {monthlyReturn(Ad(get(x)))} #obtain the monthly returns
monthly_get.AdReturns <- do.call(merge, lapply(tickers, monthly_get.AdReturns))
colnames(monthly_get.AdReturns) <- tickers
head(monthly_get.AdReturns)
##                     XLV         TSLA         NKE           HD        BABA
## 2015-01-01  0.000000000  0.000000000  0.00000000  0.000000000  0.00000000
## 2015-02-01  0.042875743 -0.001277063  0.05279144  0.098927880 -0.04445441
## 2015-03-01  0.003599377 -0.071653316  0.03604868 -0.009935121 -0.02208652
## 2015-04-01 -0.008129948  0.197488918 -0.01485147 -0.053511926 -0.02342620
## 2015-05-01  0.045042537  0.109489076  0.02863215  0.041502888  0.09878212
## 2015-06-01 -0.007339417  0.069617225  0.06535764 -0.002602706 -0.07892972
##                      PG          COST        MSFT         AAPL        NFLX
## 2015-01-01  0.000000000  0.0000000000  0.00000000  0.000000000  0.00000000
## 2015-02-01  0.017148903  0.0277639267  0.08539605  0.096449265  0.07494340
## 2015-03-01 -0.037472028  0.0676034940 -0.06614927 -0.027548912 -0.12259164
## 2015-04-01 -0.029655844 -0.0557758491  0.19626187  0.005786497  0.33552523
## 2015-05-01 -0.006156120 -0.0004883873 -0.03659534  0.040990540  0.12140161
## 2015-06-01 -0.001913638 -0.0528086799 -0.05170887 -0.033205763  0.05268725
##                     ORCL         ARKW        KWEB          XLF         XLE
## 2015-01-01  0.0000000000  0.000000000  0.00000000  0.000000000  0.00000000
## 2015-02-01  0.0489122359  0.082560010 -0.01368603  0.058235520  0.04592948
## 2015-03-01 -0.0152896351 -0.006769027  0.05673760 -0.009856411 -0.01822304
## 2015-04-01  0.0108919883  0.019536515  0.15698860  0.004565444  0.07298000
## 2015-05-01  0.0005149482  0.030748674  0.03656984  0.019477787 -0.05188691
## 2015-06-01 -0.0733504665 -0.003026496 -0.01143520 -0.008942971 -0.04120382
# INDPRO.csv - Industrial Production Index. INDPRO only has data up till May
# ICSA.csv - Unemployment Claims, has data till Jul --> remove Jun, Jul
# Montly Returns dataframe - has data till Jun --> remove Jun
ICSA <- ICSA[-c(78,79),]
monthly_get.AdReturns <- monthly_get.AdReturns[-c(78),]

ICSA$ICSA <- as.numeric(ICSA$ICSA)
INDPRO$INDPRO <- as.numeric(INDPRO$INDPRO)
corr2 <- cor(monthly_get.AdReturns[-1], ICSA$ICSA[-dim(ICSA)[1]]) %>% 
  cbind(cor(monthly_get.AdReturns[-1], INDPRO$INDPRO[-dim(INDPRO)[1]])) %>%
  cbind(cor(monthly_get.AdReturns[-1], cpi$change_in_cpi[-dim(cpi)[1]])) 
colnames(corr2) <- c("Unemployment Claims", "Industrial Production Index", "Change in CPI")


relationship <- relationship %>% cbind(corr2)

Correlation of our features with the individual stocks/etfs

head(relationship)
##              VIX          Oil US Dollar Index       Gold       SP500
## XLV   0.04723616 -0.043314021    -0.059397253 0.05389669 -0.15657956
## TSLA -0.03157975  0.014419811     0.001243666 0.03731201 -0.01174354
## NKE   0.02105278 -0.003062915    -0.058456610 0.08277621 -0.06654616
## HD    0.04596166 -0.033981289    -0.022320463 0.03966893 -0.20662765
## BABA  0.01716659 -0.023654301    -0.069055066 0.06021857 -0.05790932
## PG    0.08427416 -0.058473434    -0.046765170 0.03668532 -0.17998457
##             DJIA SP400 Mid Cap        NASDAQ        IXCO         SOX
## XLV  -0.14869678   -0.12030789 -0.1394731414 -0.13853717 -0.11381893
## TSLA -0.01244467   -0.01754007  0.0007942614 -0.00419036 -0.02445116
## NKE  -0.07001920   -0.04580438 -0.0489515535 -0.05271120 -0.02724131
## HD   -0.20150998   -0.17142184 -0.1723204263 -0.17000849 -0.14177474
## BABA -0.06194337   -0.05509271 -0.0400211989 -0.04201192 -0.04068407
## PG   -0.18114632   -0.16121513 -0.1499399583 -0.14508770 -0.12735066
##      Fed Funds Rates    OSR FSI Term Spread Bollinger Band Width          RSI
## XLV     -0.017736395 0.04790299 -0.01066530           0.04327112 -0.051427039
## TSLA    -0.030816168 0.02793434 -0.03970621           0.06122224  0.027328476
## NKE     -0.020960731 0.05247311 -0.01180389           0.04249013 -0.041398313
## HD      -0.023508815 0.06143206 -0.00750954           0.06846507 -0.018821387
## BABA     0.001831304 0.03265378 -0.02623712           0.03337261 -0.001418716
## PG       0.024523832 0.04825798 -0.04120667           0.05087497 -0.038018101
##              MACD Bullish Sentiment Unemployment Claims
## XLV  -0.031061840       -0.01511725          0.33021686
## TSLA  0.013454007       -0.08019259          0.36767170
## NKE  -0.042891083       -0.08182325          0.19453277
## HD    0.008552912       -0.07300195          0.39817033
## BABA -0.035583428       -0.05613737          0.05049277
## PG   -0.079148991       -0.06520176          0.16835890
##      Industrial Production Index Change in CPI
## XLV                  -0.07929311  -0.073541687
## TSLA                 -0.26972882  -0.070777733
## NKE                  -0.10213119  -0.037446860
## HD                   -0.21534559  -0.225918421
## BABA                 -0.09523088  -0.001242757
## PG                   -0.05925203   0.030911227

3 Construction of Team Universe Portfolio

# mean daily return of each stock/etf
mean_ret <- colMeans(hist.return)

# annualised covariance matrix
cov_mat <- cov(hist.return) * 250

# Run a 5000 random portfolio
set.seed(123) #set.seed when running simulations to ensure all results, figures, etc are reproducible. They will continue to be the same.
num_portfolio <- 5000

# Create a matrix to store the weights of individual assets that make up the portfolio
all_weights <- matrix(nrow = num_portfolio,
                  ncol = length(tickers))

# Creating an empty vector to store portfolio Returns
portfolio_returns <- vector('numeric', length = num_portfolio)

# Creating an empty vector to store portfolio S.D.
portfolio_risk <- vector('numeric', length = num_portfolio)

# Creating an empty vector to store portfolio Sharpe Ratio
sharpe_ratio <- vector('numeric', length = num_portfolio)

for (i in seq_along(portfolio_returns)) {
  
  # Weights of assets in the portfolio
  weights <- runif(length(tickers))
  weights <- weights/sum(weights) # to make sure that all add up to 1
  all_weights[i,] <- weights
  
  # Portfolio Returns
  port_ret <- sum(weights * mean_ret)
  port_ret <- ((port_ret + 1)^250) - 1 #annual returns
  portfolio_returns[i] <- port_ret
  
  # Portfolio risk
  port_sd <- sqrt(t(weights) %*% (cov_mat  %*% weights)) #Expected portfolio s.d.= SQRT(t(W) * ((Covariance Matrix) * W))
  portfolio_risk[i] <- port_sd
  
  # Sharpe Ratios (Assuming 0% Rf rate)
  sr <- port_ret/port_sd
  sharpe_ratio[i] <- sr
  
}

# Store all the above calculated values into a table
portfolio_weights <- tibble(Return = portfolio_returns,
                  Risk = portfolio_risk,
                  Sharpe_Ratio = sharpe_ratio)

all_weights <- tk_tbl(all_weights)
colnames(all_weights) <- colnames(hist.return.ts) # rename the columns of the weights dataframe

# Combine both the weights df and portfolio df into a single df
portfolio_weights <- tk_tbl(cbind(all_weights, portfolio_weights))
portfolio_weights 
## # A tibble: 5,000 x 18
##       XLV   TSLA     NKE     HD    BABA      PG   COST    MSFT   AAPL   NFLX
##     <dbl>  <dbl>   <dbl>  <dbl>   <dbl>   <dbl>  <dbl>   <dbl>  <dbl>  <dbl>
##  1 0.0337 0.0922 0.0479  0.103  0.110   0.00533 0.0618 0.104   0.0645 0.0534
##  2 0.104  0.0285 0.00488 0.0380 0.111   0.103   0.0803 0.0743  0.115  0.0760
##  3 0.140  0.131  0.101   0.116  0.00358 0.0695  0.110  0.0315  0.0463 0.0337
##  4 0.0221 0.0371 0.0741  0.0423 0.136   0.00729 0.0703 0.127   0.0194 0.0892
##  5 0.0820 0.0117 0.0474  0.0338 0.100   0.0553  0.0999 0.100   0.0980 0.0542
##  6 0.0303 0.0522 0.0843  0.0484 0.0153  0.0335  0.0919 0.0574  0.108  0.0141
##  7 0.0187 0.0932 0.0490  0.0938 0.0457  0.0268  0.112  0.0134  0.0666 0.0730
##  8 0.104  0.107  0.0710  0.0479 0.0172  0.109   0.0351 0.00708 0.111  0.0841
##  9 0.0879 0.0434 0.0418  0.0298 0.0501  0.134   0.0209 0.0124  0.0193 0.0937
## 10 0.0947 0.118  0.113   0.141  0.0630  0.0447  0.0587 0.00150 0.0264 0.121 
## # ... with 4,990 more rows, and 8 more variables: ORCL <dbl>, ARKW <dbl>,
## #   KWEB <dbl>, XLF <dbl>, XLE <dbl>, Return <dbl>, Risk <dbl>,
## #   Sharpe_Ratio <dbl>

3.0.1 Types of Portfolio

Using the dataframe that we have, we can decide what type of portfolio we want to construct and hence its respective weights in the portfolio. The following are the portfolios that we will consider.

  • Minimum Variance Portfolio (Portfolio with minimal volatility among the most efficient portfolio)
  • Tangency Portfolio (Portfolio of risky assets with the least ratio of volatility of return - highest sharpe ratio)
  • Equal Weights Portfolio (All the assets are equally weighted in the portfolio)

1. Minimum Variance Portfolio

# choosing the portfolio that gives the minimum risk
minimum_var <- portfolio_weights[which.min(portfolio_weights$Risk),]
minimum_var$Risk
## [1] 0.1736868
print(minimum_var$Return)
## [1] 0.2026216
Stocks/ETFs Portfolio Weights
XLV 17.6%
TSLA 0.638%
NKE 1.80%
HD 6.19%
BABA 3.94%
PG 17.5%
COST 12.6%
MSFT 6.40%
AAPL 6.69%
NFLX 1.49%
ORCL 7.86%
ARKW 1.63%
KWEB 13.1%
XLF 1.05%
XLE 1.53%
p_weights <- t(minimum_var[,1:15])
colnames(p_weights) <- "weights"
p_weights <- as.data.frame(p_weights)
p_weights <- cbind(ticker = rownames(p_weights), p_weights)
rownames(p_weights) <- 1:nrow(p_weights)
p_weights$weights <- as.numeric(p_weights$weights)

fig <- plot_ly(p_weights, labels = ~ticker, values = ~weights, type = 'pie')
fig <- fig %>% layout(title = 'Minimum Variance Portfolio Asset Weights')

fig
#Portfolio Daily Return Rebalance every Quarters

mv_weights <- as.vector(minimum_var[,1:15] %>% unlist())
mv_portfolio_Return <- Return.portfolio(AdCloseReturns, weights = mv_weights, rebalance_on = "quarters")

chart.Histogram(mv_portfolio_Return, main = "Portfolio Daily Returns Distributions", cex.axis = 1.2, cex.lab = 1.5, cex.main = 2,colorset = "#F77171", note.line = mean(Return.portfolio(mv_portfolio_Return)), note.label = 'Average', note.color = 'black', note.cex = 1.2)

2. Tangency Portfolio

# choosing the portfolio that gives the highest sharpe ratio
maximum_sr <- portfolio_weights[which.max(portfolio_weights$Sharpe_Ratio),]
print(maximum_sr$Sharpe_Ratio)
## [1] 1.639583
print(maximum_sr$Return)
## [1] 0.3790735
Stocks/ETFs Portfolio Weights
XLV 4.82%
TSLA 14.6%
NKE 0.0473%
HD 4.77%
BABA 4.47%
PG 1.53%
COST 14.4%
MSFT 9.21%
AAPL 5.13%
NFLX 15.4%
SE 4.30%
ARKW 13.9%
KWEB 4.06%
XLF 0.282%
XLE 3.06%
p_weights_2 <- t(maximum_sr[,1:15])
colnames(p_weights_2) <- "weights"
p_weights_2 <- as.data.frame(p_weights_2)
p_weights_2 <- cbind(ticker = rownames(p_weights_2), p_weights_2)
rownames(p_weights_2) <- 1:nrow(p_weights_2)
p_weights_2$weights <- as.numeric(p_weights_2$weights)

fig1 <- plot_ly(p_weights_2, labels = ~ticker, values = ~weights, type = 'pie')
fig1 <- fig1 %>% layout(title = 'Tangency Portfolio Asset Weights')

fig1
#Portfolio Daily Return Rebalance every Quarters

tp_weights <- as.vector(maximum_sr[,1:15] %>% unlist())
tp_portfolio_Return <- Return.portfolio(AdCloseReturns, weights = tp_weights, rebalance_on = "quarters")

chart.Histogram(tp_portfolio_Return, main = "Portfolio Daily Returns Distributions", cex.axis = 1.2, cex.lab = 1.5, cex.main = 2,colorset = "#F77171", note.line = mean(Return.portfolio(tp_portfolio_Return)), note.label = 'Average', note.color = 'black', note.cex = 1.2)

3. Equal Weightage Portfolio

eq_weight <- rep(1/length(tickers), length(tickers))

#equal weights portfolio returns
eq_port_ret <- sum(eq_weight * mean_ret)
eq_port_ret <- ((eq_port_ret + 1)^250) - 1 #annual returns

#equal weights portfolio s.d.
eq_port_sd <- as.vector(sqrt(t(weights) %*% (cov_mat  %*% weights)))

#equal weights portfolio sharpe ratio
eq_port_sr <- as.vector(port_ret/port_sd)

eq_portfolio <- append(eq_weight,eq_port_ret) # add to the vector
eq_portfolio <- append(eq_portfolio,eq_port_sd) # add to the vector
eq_portfolio <- append(eq_portfolio,eq_port_sr) # add to the vector


#Portfolio Daily Return Rebalance every Quarters
eq_weight_portfolio_Return <- Return.portfolio(AdCloseReturns, weights = eq_weight, rebalance_on = "quarters")

chart.Histogram(eq_weight_portfolio_Return, main = "Portfolio Daily Returns Distributions", cex.axis = 1.2, cex.lab = 1.5, cex.main = 2,colorset = "#F77171", note.line = mean(Return.portfolio(eq_weight_portfolio_Return)), note.label = 'Average', note.color = 'black', note.cex = 1.2)

Overall efficient frontier chart

p <- portfolio_weights %>%
  ggplot(aes(x = Risk, y = Return, color = Sharpe_Ratio)) +
  geom_point() +
  theme_classic() +
  scale_y_continuous(labels = scales::percent) +
  scale_x_continuous(labels = scales::percent) +
  labs(x = 'Annualized Risk',
       y = 'Annualized Returns',
       title = "Portfolio Optimization & Efficient Frontier") +
  geom_point(aes(x = Risk,
                 y = Return), data = minimum_var, color = 'red') +
  geom_point(aes(x = Risk,
                 y = Return), data = maximum_sr, color = 'red') +
  expand_limits(x= 0, y =0)

ggplotly(p)

Overview of the 3 types of portfolio

`Type of Portfolio` <- c("Minimum Variance Portfolio", "Tangency Portfolio", "Equal Weightage Portfolio")
portfolios <- rbind(minimum_var,maximum_sr,eq_portfolio)
portfolios <- cbind(`Type of Portfolio`,portfolios)
print(portfolios)
##            Type of Portfolio        XLV       TSLA          NKE         HD
## 1 Minimum Variance Portfolio 0.17586213 0.00637531 0.0179555523 0.06193003
## 2         Tangency Portfolio 0.04824505 0.14620871 0.0004730365 0.04773789
## 3  Equal Weightage Portfolio 0.06666667 0.06666667 0.0666666667 0.06666667
##         BABA         PG       COST       MSFT       AAPL       NFLX       ORCL
## 1 0.03935670 0.17491319 0.12639856 0.06400701 0.06685313 0.01487256 0.07856802
## 2 0.04472338 0.01525705 0.14402430 0.09211574 0.05129551 0.15419997 0.04300731
## 3 0.06666667 0.06666667 0.06666667 0.06666667 0.06666667 0.06666667 0.06666667
##         ARKW       KWEB         XLF        XLE    Return      Risk Sharpe_Ratio
## 1 0.01631139 0.13077219 0.010545235 0.01527898 0.2026216 0.1736868     1.166592
## 2 0.13870618 0.04055427 0.002817264 0.03063434 0.3790735 0.2312012     1.639583
## 3 0.06666667 0.06666667 0.066666667 0.06666667 0.2693762 0.1966344     1.333700

We will thus pick the Tangency Portfolio since it gives the highest return and sharpe ratio among the 3 efficient portfolios constructed.

4 Portfolio

# obtain the daily expected return of this portfolio
port_returns <- as.matrix(AdCloseReturns) %*% as.matrix(p_weights_2[,2])
port_returns <- as.data.frame(port_returns)
colnames(port_returns) <- "daily.port.return"
port_returns <- cbind(date = rownames(port_returns), port_returns)
rownames(port_returns) <- 1:nrow(port_returns)
port_returns$date <- as.Date(port_returns$date)


#cumulative return - for constructed portfolio
port_returns$cumulative.return <- rep(NaN, dim(port_returns)[1]) # fill column with NA values #dim(DIS)[1] means number of rows

for (i in 2:dim(port_returns)[1]) {
  port_returns$cumulative.return[i] <- prod(1+port_returns$daily.port.return[2:i])-1
}

#cumulative return - for individual stocks
indiv_stock_cumulative_returns <- as.xts(apply(AdCloseReturns, 2, cumsum))

# Merge
new_dataframe <- cbind(indiv_stock_cumulative_returns,port_returns$cumulative.return)
colnames(new_dataframe)[16] <- "portfolio"

# Convert dataframe from wide to long format to plot chart
new_df <- data.frame(date=index(new_dataframe), coredata(new_dataframe))
new_df <- new_df %>% gather("ticker", "returns", -date) 

# Draw cumulative return
new_df %>%
  ggplot(aes(x=date, y=returns*100, group=ticker, color=ticker)) +
  geom_line() +
  ggtitle("Performance - Daily Cumulative Returns of Portfolio vs. Individual Stocks/ETFs") +
  theme_ipsum() +
  ylab("Daily Cumulative Returns (%)") +
  xlab("Date") + 
  theme(plot.title = element_text(hjust = 0.5,size=10.5))

  • Miscellaneous Plots
plot1 <- ggplot(AAPL, aes(x = index(AAPL))) + geom_line(aes(y = AAPL.Adjusted, colour = "AAPL Adjusted Closing Price")) +
  scale_colour_manual(values = c("darkred"))
plot2 <- ggplot(rsi_df_aapl, aes(x = index(rsi_df_aapl))) +  geom_line(aes(y = rsi_aapl, colour = "RSI")) +
  scale_colour_manual(values = c("steelblue"))

grid.newpage()
grid.draw(rbind(ggplotGrob(plot1), ggplotGrob(plot2), size = "last"))


AAPL.v1 <- myMACD(AAPL,AAPL$AAPL.Adjusted, 12, 26,9)
AAPL.v1 <- AAPL.v1[index(AAPL.v1) >= date_band,]
AAPL.v1$dist <-AAPL.v1$MACD-AAPL.v1$signal 
AAPL.v1$date <- as.character(index(AAPL.v1))
AAPL.v1 <- as.data.frame(AAPL.v1)
AAPL.v1$date <-as.Date(AAPL.v1$date)

AAPL.v2 <- AAPL.v1
AAPL.v2$direction <- NA
AAPL.v2$direction <- ifelse(AAPL.v1$dist>0,"Increasing","Decreasing")
AAPL.v2$date <-as.Date(AAPL.v2$date)
AAPL.v2 <- AAPL.v2[,c(1,6)]
AAPL.v3 <- AAPL.v1 %>% left_join(AAPL.v2, by="date")

AAPL.v3$MACD <- as.numeric(AAPL.v3$MACD)
AAPL.v3$signal <- as.numeric(AAPL.v3$signal)
AAPL.v3$dist <- as.numeric(AAPL.v3$dist)

macd_chart <- ggplot(AAPL.v3, aes(x = date))
macd_chart <- macd_chart + geom_line(aes(y = MACD, colour = "Moving Average Covergence Divergence (12,6,9)"))
macd_chart <- macd_chart + geom_line(aes(y = signal, colour = "Signal"), linetype="dashed")
macd_chart <- macd_chart + geom_bar(aes(y = dist,fill = as.factor(direction)),stat = "identity") + scale_fill_manual(values = c("Increasing" = "#008000", "Decreasing" = "#FF0000"))
macd_chart <- macd_chart + scale_colour_manual(values = c("darkred", "steelblue"))
macd_chart <- macd_chart + scale_x_date(limits = as.Date(c('2018-01-01','2020-06-30')))
macd_chart
AAPL.v1$MACD <- as.numeric(as.character(AAPL.v1$MACD))
# Technical Factors
factor.model <- lm(AdCloseReturns$XLV ~ rsi_full$rsi_xlv + macd_full$macd_xlv + bb_width_full$xlv_bb_width)
summary(factor.model)
## 
## Call:
## lm(formula = AdCloseReturns$XLV ~ rsi_full$rsi_xlv + macd_full$macd_xlv + 
##     bb_width_full$xlv_bb_width)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.096683 -0.005577 -0.000197  0.005180  0.075351 
## 
## Coefficients:
##                               Estimate  Std. Error t value            Pr(>|t|)
## (Intercept)                -0.01899862  0.00181294 -10.479 <0.0000000000000002
## rsi_full$rsi_xlv            0.00033175  0.00002991  11.090 <0.0000000000000002
## macd_full$macd_xlv         -0.00071105  0.00112595  -0.632              0.5278
## bb_width_full$xlv_bb_width  0.02260788  0.00697011   3.244              0.0012
##                               
## (Intercept)                ***
## rsi_full$rsi_xlv           ***
## macd_full$macd_xlv            
## bb_width_full$xlv_bb_width ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01085 on 1630 degrees of freedom
## Multiple R-squared:  0.09593,    Adjusted R-squared:  0.09426 
## F-statistic: 57.65 on 3 and 1630 DF,  p-value: < 0.00000000000000022
names(factor.model)
##  [1] "coefficients"  "residuals"     "effects"       "rank"         
##  [5] "fitted.values" "assign"        "qr"            "df.residual"  
##  [9] "xlevels"       "call"          "terms"         "model"
factor.model$coefficients
##                (Intercept)           rsi_full$rsi_xlv 
##              -0.0189986229               0.0003317537 
##         macd_full$macd_xlv bb_width_full$xlv_bb_width 
##              -0.0007110457               0.0226078782
factor.exposure <- factor.model$coefficients[2:4]

# Sentiment Factor
factor.model.2 <- lm(AdCloseReturns$XLV ~ rsi_full$rsi_xlv + macd_full$macd_xlv + bb_width_full$xlv_bb_width)

5 Portfolios Cumulative Returns Chart

# Tangency Portfolio Cumulative Returns
port_returns <- as.matrix(AdCloseReturns) %*% as.matrix(p_weights_2[,2])
port_returns <- as.data.frame(port_returns)
colnames(port_returns) <- "daily.port.return"
port_returns <- cbind(date = rownames(port_returns), port_returns)
rownames(port_returns) <- 1:nrow(port_returns)
port_returns$date <- as.Date(port_returns$date)


port_returns$cumulative.return <- rep(NaN, dim(port_returns)[1]) # fill column with NA values #dim(DIS)[1] means number of rows

for (i in 2:dim(port_returns)[1]) {
  port_returns$cumulative.return[i] <- prod(1+port_returns$daily.port.return[2:i])-1
}


# Min Variance Portfolio Cumulative Returns
mvp_port_returns <- as.matrix(AdCloseReturns) %*% as.matrix(p_weights[,2])
mvp_port_returns <- as.data.frame(mvp_port_returns)
colnames(mvp_port_returns) <- "daily.port.return"
mvp_port_returns <- cbind(date = rownames(mvp_port_returns), mvp_port_returns)
rownames(mvp_port_returns) <- 1:nrow(mvp_port_returns)
mvp_port_returns$date <- as.Date(mvp_port_returns$date)


#cumulative return - for constructed portfolio
mvp_port_returns$cumulative.return <- rep(NaN, dim(mvp_port_returns)[1]) # fill column with NA values #dim(DIS)[1] means number of rows

for (i in 2:dim(mvp_port_returns)[1]) {
  mvp_port_returns$cumulative.return[i] <- prod(1+mvp_port_returns$daily.port.return[2:i])-1
}



# Equal Weighted Portfolio Cumulative Returns

a <- as.data.frame(eq_weight)
b<- as.data.frame(tickers)
c <- cbind(b,a)
eq_port_returns <- as.matrix(AdCloseReturns) %*% as.matrix(c[,2])
eq_port_returns <- as.data.frame(eq_port_returns)
colnames(eq_port_returns) <- "daily.port.return"
eq_port_returns <- cbind(date = rownames(eq_port_returns), eq_port_returns)
rownames(eq_port_returns) <- 1:nrow(eq_port_returns)
eq_port_returns$date <- as.Date(eq_port_returns$date)


#cumulative return - for constructed portfolio
eq_port_returns$cumulative.return <- rep(NaN, dim(eq_port_returns)[1]) # fill column with NA values #dim(DIS)[1] means number of rows

for (i in 2:dim(eq_port_returns)[1]) {
  eq_port_returns$cumulative.return[i] <- prod(1+eq_port_returns$daily.port.return[2:i])-1
}



#SP500 cumulative returns
sp500_returns <- as.data.frame(indicators$sp500_return)
colnames(sp500_returns) <- "sp500"
sp500_returns$cumulative.return <- rep(NaN, dim(sp500_returns)[1]) # fill column with NA values #dim(DIS)[1] means number of rows

for (i in 2:dim(sp500_returns)[1]) {
  sp500_returns$cumulative.return[i] <- prod(1+sp500_returns$sp500[2:i])-1
}


# Merge
X3_port_df <- cbind(mvp_port_returns[,c(1,3)], port_returns[,3], eq_port_returns[,3],sp500_returns[,2])
  
  
colnames(X3_port_df) <- c("date", "Min Var Portfolio", "Tangency Portfolio", "Equal-weighted Portfolio", "SP500")

# Convert dataframe from wide to long format to plot chart
X3_port_df$date <- as.Date(X3_port_df$date)
X3_port_df <- X3_port_df[-1,]
X3_port_df_new <- X3_port_df %>% gather("portfolio", "cum_returns", -date) 

# Draw cumulative return
data_a <- X3_port_df_new[X3_port_df_new$portfolio != "SP500",]
data_a %>%
  ggplot(aes(x=date, y=cum_returns*100, group=portfolio, color=portfolio)) +
  geom_line() +
  ggtitle("Performance - Daily Cumulative Returns of 3 DIfferent Portfolios") +
  theme_ipsum() +
  ylab("Daily Cumulative Returns (%)") +
  theme(plot.title = element_text(hjust = 0.5,size=14.5))
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

# annualized return (change from cumulative returns to annualised returns)
(1+X3_port_df$`Min Var Portfolio`[dim(X3_port_df)[1]])^(1/7)-1  # 7 years of data
## [1] 0.1713734
(1+X3_port_df$`Tangency Portfolio`[dim(X3_port_df)[1]])^(1/7)-1
## [1] 0.3166056
(1+X3_port_df$`Equal-weighted Portfolio`[dim(X3_port_df)[1]])^(1/7)-1
## [1] 0.2265718
(1+X3_port_df$SP500[dim(X3_port_df)[1]])^(1/7)-1
## [1] 0.1106907
  • Miscellaneous Plots
# Volatility Plot
sp500_returns_new <- cbind(sp500,sp500_returns)
sp500_returns_new <- sp500_returns_new[,c(1,3)]
sp500_returns_df <- xts(sp500_returns_new[,-1], order.by=as.Date(sp500_returns_new[,1], "%m/%d/%Y"))

par(mfrow = c(1, 2), mai = c(1, 1, 1, 1))
chart.RollingPerformance(R=sp500_returns_df, width =12, FUN = "sd.annualized", main = "SP500 Rolling 12 months volatility")
par(new=TRUE)
chart.RollingPerformance(R=tp_portfolio_Return, width =12, FUN = "sd.annualized", main = "Tangency Portfolio Rolling 12 months volatility")